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ABSTRACT 

Motivation: The recent development of high-throughput drug 
profiling (high content screening or HCS) provides a large amount 
of quantitative multidimensional data. Despite its potentials, it poses 
several challenges for academia and industry analysts alike. This is 
especially true for ranking the effectiveness of several drugs from 
many thousands of images directly. This paper introduces, for the first 
time, a new framework for automatically ordering the performance 
of drugs, called fractional adjusted bi-partitional score (FABS). This 
general strategy takes advantage of graph-based formulations and 
solutions and avoids many shortfalls of traditionally used methods in 
practice. We experimented with FABS framework by implementing 
it with a specific algorithm, a variant of normalized cut — normalized 
cut prime (FABS-NC), producing a ranking of drugs. This algorithm 
is known to run in polynomial time and therefore can scale well in 
high-throughput applications. 

Results: We compare the performance of FABS-NC to other 
methods that could be used for drugs ranking. We devise two 
variants of the FABS algorithm: FABS-SVM that utilizes support 
vector machine (SVM) as black box, and FABS-Spectral that utilizes 
the eigenvector technique (spectral) as black box. We compare the 
performance of FABS-NC also to three other methods that have 
been previously considered: center ranking (Center), PCA ranking 
(PCA), and graph transition energy method (GTEM). The conclusion 
is encouraging: FABS-NC consistently outperforms all these five 
alternatives. FABS-SVM has the second best performance among 
these six methods, but is far behind FABS-NC: In some cases FABS- 
NC' produces over half correctly predicted ranking experiment trials 
than FABS-SVM. 

Availability: The system and data for the evaluation reported here 
will be made available upon request to the authors after this 
manuscript is accepted for publication. 
Contact: |yxy1 28@berkeley.edu] 



1 INTRODUCTION 

Automated microscopy is increasingly used in drug discovery, 
especially predicting the toxicity of new drugs f Perlman and 
Altschuler. I2004I) . The so-called high-content screening (HCS) 
has greatly enhanced investigators' capabil ity of discerning the 
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accomplishes this by analyzing phenotypic features of the cells from 
tens of thousands cell images produced by HCS. In addition, the 
decreasing cost o f such a method means a wide- spread application 
(iLin et a/.Ll2010l) . HCS employs cell imaging assays, tagged with 
fluorescent dyes — each field of cells contains these tags for its 
different macromolecules. Automated microscopy is performed to 
produce a large amount of visual information. 

There are three steps during this process dMitchisonl l2005al : 
I Yarrow et a/ll2003l) : fluorescence-tagging, automated microscopy 
and identification and measurement of target phenotypic feature(s) 
for further analysis. The analysis step usually poses the most 
challenge. To extract meaning out of a gigantic image database, 
traditional tools usually need to be tailored to specific known 
phenotype's features, instead of unknown yet more informative 
differences. For example, it has been reported that applying an 
analysis method that only distinguishes phenotypic changes in 
cellular level misses on the detecting meaningful morphol ogical 
modificat ion on subcellular structures (iTaguchi et a/.Ll2007l; Zhou 
and Wong. l2006l) . 

In high-throughput drug screening assays, typically a 
quantity, such as normal i zed in tensity of a reporter fluorescent 
protein dMorelock et all l2005l) , is assumed to be measurable. 
Differences between samples of two distinct cell populations 
(such as treated versus untreated) are estimated a nd tested for 
signifi cance. Methods using statistics like Z'-factor dZhang et all 
Il999h to evaluate reliability of the measurements have been 
developed. Comparison of the difference is usually done by 
performing a multivariate F-test to test whether two populations 
are distributed differently. But F-test may introduce high errors 
when the distributions are not normal, which is expected to be the 
case in many types of cell responses. Moreover, in image-based 
assays, the use of a measurable quantity is no longer applicable 
when this quantity is not straightforward to obtain directly and the 
measurement itself can never be perfect. For example, to measure 
the composition of morphological subtypes of mitochondria 
requires pattern recog nition algorithms to accurately detect and 
quantify target events jPeng et a/ll201ll) . Though many advanced 
algorithms have been developed for years, these pattern recognition 
algorithms usually require non-trivial tuning and optimization for 
each study because they may generalize poorly, sometimes not 
even generalize within a well, due to noise and systematic bias 
introduced during the sample preparation and imaging process 
steps, inducing additional overhead when attempts are made to 
scale up the assay to high throughput. 

Another challenge is when a multiplex approach is required, 
where multiple independent quantities are measured for each 
single cell. In these cases, response of each single cell will be a 



© The Author(s) 2012. Published by Oxford University Press. 

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ 
by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. 



Drug profiling by FABS 



multi-dimensional vector. How to measure difference between these 
vectors become an issue because simple Euclidean distance in the 
multi-dimensional space may not serve the need. One solution is to 
come up with an appropriate 'metric' to convert multi-dimensional 
vectors into a scalar that reflects the difference. There is, however, 
no generally applicable solution about how to come up with this 
metric. Usually, one or more dimensions in the vector come from an 
imperfectly measured quantity, such as one that requires advanced 
pattern recognition in order to automatically extract, as discussed in 
the previous paragraph. Another issue is that our observation is the 
result of sampling, which inevitably introduces sampling errors and 
is further complicated by p ossible heterogeneous responses by cells 
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The focus of this research is to address the issues mentioned above 
for the application of HCS in drug ranking. Drug ranking refers to the 
ordering of a group of different drugs according to their effectiveness 
by certain criteria. On e of the mo s t used criteria is the relative 
toxicity among drugs jPaull et al.l Il992h . Ideally, this provides 
the important scale to assess relative merit of each candidate drug. 
However, each cell responds to a certain drug differently, thus 
making the outcome of any ranking highly dependent on sampling 
and noise. A conspicuous example is the fragmentation of cells or 
organelles: the intact and the completely fragmented states are easy 
to recognize while the degree of partial fragmentation is difficult 
to gage, thus often involving human experts and time-consuming 
manual proce sses. This is infeasible for high-thr oughput screening 
such as HCS Lin et a/lEoiol i lPeng et fl/.U201ll) . 

Our objective is to develop an efficient and accurate ranking 
measure (metric learning) that can be used to order candidate 
drugs according to their effectiveness. To this end, we developed a 
framework called Fractional Adjusted Bi-partitional Score (FABS). 
This general strategy, introduced here for the first time, takes 
advantages of graph-based formulations and solutions and avoids 
many shortfalls of traditionally used methods in practice. We use 
such a scheme because gra ph-based construction work s well in 
several a reas of data m ining (IWashio and Motodal 20031). machin e 
learning jjordanl 1 19961) and i mage proce s sing JHochbaumi l200ll) . 
whereas a recent publication (llin et all l2010h also confirms its 
usefulness in the HCS context. 

In order to apply our FABS to the images, w e use a feature 
extraction tool first presented in jPeng et q/ll201ll) . This tool takes 
cell images and output several vectors that represents important 
geometric and other features of the target images — these vectors 
are then used as inputs for getting FABS. 

One feature of FABS is that it has, as part of the input and 
as training data, extreme cases labeled as positive and negative 
controls, which in our case are the intact and the completely 
fragmented states mentioned previously. The algorithm does not 
involve any training from in-between cases, which are hard to come 
by. This completely sidesteps the common problem of a laborious 
and time-consuming annotation step, performed by experts to assess 
the relative merit of drugs for a small sample of images used as a 
training group. Furthermore, our measure takes the advantage of 
high-volume nature of the dataset, using all available images for 
computation of FABS for each drug. This reduces the effect of noise 
and sampling bias. This framework can potentially be used for any 
task that requires to quantify subtle and implicit differences between 
populations of high-dimensional feature vectors. By formulating the 
problem as a biparition problem as in FABS, there is no need to 



solve an image-based drug ranking problem as a regression problem. 
Our preliminary formal analysis of FABS shows that the expected 
error and variance of the estimated scores by FABS will be within 
a manageable range given the classification error by the bipartition. 

To empirically evaluate our framework, we use a mo del of (NC r ) 
and the respective algorithm recently introduced by iHochbauml 
J2010ch . That algorithm runs efficiently and is furthermore 
com binatorial. This latter feature differentiates it from ref. ( Lin 
et ah, l2010h in which a spectral techniques is used to achieve a 
bipartitioning. Combinatorial solutions are superior than spectral 
ones in several reg ards such as being more efficient and accurate 
(lHochbaumLl2010bllc1) . as shown in our experimental results. 

2 METHODOLOGY 

This section presents a general framework for quantifying the 
difference in morphological composition between populations of 
cells. The proposed framework utilizes a procedure named FABS- 
A, where A stands for a bipartition algorithm and FABS stands for 
(FABS). We show that using certain graph, theoretical formulations 
for the bipartition algorithm avoids many shortfalls of the methods 
used in practice. Its importance lies in teasing apart cell groups based 
on morphological composition and in detecting whether or not such 
differences exist. 

As previously mentioned, we use a feature extraction tool, capable 
of processing cell images with different dimensionalities (from 
static 2D to animated 3D with multiple channels) to generate high- 
dimensional (in our experiments, 134 D) output vectors, called 
feature vectors. Each feature vector, corresponds to an image of a 
single cell and contains measurements for the image characteristics, 
such as the intensity of the image, the shape of a particular 
object in the image, etc. Each group of cell images (and their 
corresponding feature vectors) can be associated with a certain 
population (e.g. populations representing cells to whom a certain 
drug has been applied). 

The method proposed in this section, FABS-.A, is capable of 
receiving — as input — the feature vectors from cells representing 
different populations and detecting and quantifying the differences 
between these populations. For example, given the features extracted 
from the mitochondrial images of two populations of cells, one 
derived from diseased tissues and the other from healthy tissues, 
FABS -.A will tell us to what extent the fragmentation levels of 
their mitochondria are different and estimate the significance of the 
difference. 

We then perform FABS-.A on the processed feature vectors. The 
input to FABS-*4 is the processed feature vectors by principal 
component analysis (PCA) to reduce dimensionalities of the original 
data , each of which belongs to a certain population set, namely, 
Pi, and training data. The training data consists of feature vectors 
belonging to two populations on the opposite ends of the spectrum, 
R\ and R2. These two population sets represent positive and negative 
controls, which in this experiment are the completely fragmented 
and the completely intact mitochondria cell populations. 

Computation of FABS-.A, the details of which will be discussed 
shortly, consists of three steps: The first step is to construct a graph 
from the input data. The second step is to apply a blackbox algorithm 
(A) to find a bipartition on the resulting graph. The third step is to 
recover a scalar score for each population, based on the fraction of 
the cases that fall in the side of the partition boundary (cut) that 
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contains positive controls. The blackbox can be any appropriate 
bipartitioning algorithm available. The algorithm we propose to use 
for the blackbox so lves the normalized cut prime (NC r ) problem 
(iHochbaumL l2010bh . We refer to this algorithm as NC r . We shall 
see in the 'Results' section that this bipartitioning algorithm, in 
the context of FABS-.A (FABS-NC'), outperforms Support Vector 
Machine (SVM) algorithm (FABS-SVM). This overall framework 
provides a flexible general strategy for quantifying the differences 
among population groups. 

The major advantages of FABS-^4 include: 

(1) it is capable of efficiently processing the high-dimensional 
input dat a acquired from th e images using feature extraction 
tool from lPeng et all bOllh : 

(2) the generated output is one-dimensional, in that a single scalar 
score is generated for each population of multidimensional 
vectors. As such, the difference between the scores can be used 
to quantify population differences in an unambiguous way; 

(3) the calculation of the output scores is done in a way 
that reduces the effects of outliers in distinguishing cell 
populations; 

(4) unlike many statistical tests, it does not assume any underlying 
distribution for the populations; 

(5) the labeled training data set required is minimal and 
easily obtainable, requiring minimum intervention from the 
experts; and 

(6) it scales well in high-throughput applications. 

In what follows we describe the three steps of FABS-^4 in more 
details. 

2.1 The FABS-.4 Algorithm 
Step 1: graph construction 

As mentioned previously, the input to FABS-^4 consists of n (pre- 
processed) feature vectors, V={vi, v n ], each associated with 
an HCS image, obtained after feature extraction and PCA pre- 
processing. This input includes k population sets, {Pi, ...,P^}. Each 
population set in this case represents a set of feature vectors 
corresponding to cells treated with a certain drug. Each feature 
vector v; belongs to one of the population sets, indicating in this 
case what drug has been applied to the particular cell the vector 
is representing. The input also contains two training sets {Pi,P2}, 
representing the extreme cases such as the completely fragmented 
and the completely intact mitochondria cell populations. In the graph 
construction step of FABS-*4, an undirected graph G = (V ,E,\,w) 
is created, where each node v; e V corresponds to a feature vector. 
The set of all possible pairs correspond to the set of edges of the 
graph E = V xV that form a complete graph. Each feature vector 
v; is labeled with Z v ., which is the index of the population set it 
belongs to. The labeling function, Z v ., assigns a mapping from each 
feature vector, v;, to its corresponding population set, determining 
which population it belongs to. A weight function w : V x V — ► 9t + 
associates with each pair of nodes {ij} (an edge) its encoding 
connection strength, or the similarity strength between the two 
nodes. For each edge [i j], the weight and the distance between 



the two points v; and vj have the relationship: one goes up as the 
other goes down (or vice versa) — this also means that Wy and 
the similarity between v; and vj both go up or down together. 
Several distance measures can be used for this purpose, among 
them, Euclidean, city block and Minkowski distances. Notice that 
constructing these similarity measures makes the dimensionality of 
the vectors irrelevant to our algorithm. 

Step 2: bipartitioning the graph using NC 

We first introduce some notations; given a graph G = (V,E), a 
bipartition of the graph, or a cut, is defined as (S, S) = {[i,j]\i eSJe 
S}, where S = V\S. The capacity of a cut (S,S), is defined as: 

C(S,S)= J2 w iJ- 
ieS,jeS,[i,j]eE 

More generally, for any pair of sets A,B c V, the capacity of the cut 
is denoted by C(A,B) = ^2 ieA j eB Wij. Similarly, the capacity of a 
set, D C V, is denoted by C(D) = C(D,D) = J2i,jeD,[iJ]eE w ij- 

As previously mentioned, in the second step of FABS-A we use 
a blackbox algorithm to find a bipartition on the graph. A bipartition 
algorithm aims at finding the cut that separates the graph into S 
and S, according to some underlying objectives. There are many 
different objectives that can be selected. For instance, the bipartition 
algorithm for the well-known minimum cut problem is defined with 
the goal of separating the graph into S and S such that C(S, S) is the 
minimum among all possible non-empty subsets S and S. Since the 
goal is to obtain a bipartition for the FABS-^4 calculation process, 
any bipartition algorithm can be used as a blackbox. However, an 
extra requirement has to be imposed (either by the internal working 
of the algorithm or by an external constraint) listed as follows. 

Requirement 1. All positive controls Ri must be in S (or S) and 
all negative controls R2 must be in S (or S). 

For a particular blackbox implementation of FABS-^4 in Step 2 
of Algorithm [T] we choose the previously mentioned bipartitioning 
algorithm, called NC r , and adjust it to guarantee that the constraint 
listed in Requirement [T] is satisfied. The resulting FABS-NC' 
is semi- supervised in nature and incorporates all information of 
the corresponding graph. The NC r problem is defined as finding 
min^ c y C(S,S)/C(S,S) on a given graph. This objective combines 
the goal of minimizing the similarity between the two parts of the 
bipartition, the quantity C(S,S), with the goal of maximizing the 
similarity between the elements of S. For a graph G = (V,E), we 
denote NC (G) = min s r v C(S S)/C(S ,S) An ef ficient algorithm 
for this problem was given in (IHochbaumL I2010allt31cl). 

The polynomial time algorithm described in " Eochbaumir2010bh 
for NC' was based on showing that solving NC' is equivalent to 
solving a certain parametric s, t-cut problem. In an s, t-cut problem 
a node of a graph s is required to be on one side of the bipartition, 
whereas the node t is required to be on the opposite side. 

In the adaptation of the parametric s,t -cut algorithm for the 
FABS-^4 framework, the positive and negative control data are used 
as seed nodes that are forced to join s and t in the graph. This is 
achieved through setting the nodes in R\ to be 'infinitely similar' to 
the source node s, and the nodes of P2 to be 'infinitely similar' to 
the sink node t. In terms of the graph that means that we add edges 
of infinite weight between the source node s and all nodes in Pi, 
and edges of infinite weight between the nodes of P2 and t. 
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Since NC r c an be solved in th e running time of a minimum 
s^-cut problem (lHochbaumLl20 1 Obh , our FABS-NC' implementation 
is efficient, solving in polynomial time. We later compare 
the performance of FABS-NC', with FABS-SVM, where the 
bipartitioning algorithm used is SVM, whose objective is to find a 
high-dimensional hype rplane that is as wide as possible to s eparate 
data of different labels (iCristianini and Shawe-Tavlo"ll2QQ0l) . 
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Step 3: computing FABS scores 

After a bipartition algorithm has been applied on G, all feature 
vectors in the graph are partitioned into S and S. In the third step of 
FABS-^4, a scalar score, FABSp., is calculated for each population 
set Pf. FABSp. is the fraction of the number of feature vectors in 
Pi that fall in the set S, to the total number of feature vectors in P(. 
Formally, 

\snPi\ 

FABS p. = -. 

1 \Pi\ 

This is shown pictorially in Figure [T] The FABS scores of the 
populations are then used to rank them: the higher the FABS 
score the closer is the population to R\. The FABS scores are 
therefore ordered so that FABS/> >FABSp > ... ^FABSp^, 
where (tt(1), ...7t(k)) is a permutation of (1,2, k). The ranking 
of the populations is then given by (7r(l), . . .7z{k)). 
The entire procedure is summarized in Algorithm [T] 

Algorithm 1 FABS -A 

Inputs: The feature vectors {v\,...,v n } extracted from images 
(possibly after PCA pre-processing), and their corresponding 
population sets {Pi,...,P^}\ The training data (or extreme sets) 
{*1,*2} 

Step 1: Construct G = {V ,E,\, w}, a complete graph from feature 
vectors; 

Step 2: Use a bipartitioning algorithm A to find a bipartition (S,S) 
on G such R\<^S and R2^S; 
Step 3: VP/, calculate FABSp. = 

Step 4: The FABS scores are ordered so that FABSp > 
FABSp >...>FABSp , where (tt(\), ...7r(k)) is a permutation 
of (1,2, ...,k). The ranking of the populations is then given by 
(;r(l) f . ..*(*)); 

Output: An ordered array of population sets based on their FABS 
score, {i?i,P^,....,P^,i? 2 }- 



2.2 Significance test 

One can further use the FABS scores to test statistical significance 
of the difference between the effects of two drugs. The idea is to 
apply bootstrapping to obtain FABS scores from a large number of 
resampling trials and then perform hypothesis test on the difference 
of the distributions of FABS. Algorithm [2] gives the test procedure, 
which takes resulting FABS from repeated experiment and calculate 
P-values from a £-test for each drug. The obtained P-value is then 
transformed into a log score —log/?. 

To see if Mest is appropriate, there are several important 
assumptions to check. First, the sets of FABS of two drugs must 
each be normally distributed. We plotted a histogram of FABS 
scores obtained by our FABS-SVM implementation and observed 



Fig. 1. (a) The input with the feature vectors of images associated 
with positive and negative controls R\ and R2 and four different 
drugs drug A, drug B, drug C and drug D; (b) The bipartition 
boundary after the cut is found: if R2 contains negative controls, such 
as the completely fragmented state of mitochondria for toxicity criterion, 
while R\ contains positive controls, representing cells in a desired 
normal healthy state with mitochondria rescued from the completely 
fragmented, then FABS drug A = l, FABS drug B = 2/3, FABS drug c = l/3, 
and FABSdrug d = 0- Our ranking of the drugs will be: drug A >> drug 
B > > drug C > > drug D, where x > > y indicates that x is more effective 
than y 



that the distributions for each drug in our test data are roughly 
bell shaped. In addition, for Z-IE TD and Z-LEHD , the P -values 
obtained through Jarque-Bera test (Ijarque and Beralll987l) are 0.5 
and 0.0718, respectively, indicating approximate normality for both. 
Another assumption is that variance for each group must be equal. 
Though this is usually not the case in drug profiling applications, 
t-test is robust against unequal variances if the sample sizes are 
approximately equal for each group, which can be enforced in 
drug profiling applications. Other assumptions, such as that sample 
means and sample variances must be statistically independent, can 
be compensated when the sample is moderately large or larger, which 
is always the case for HCS. Consequently, the Mest is appropriate for 
our purposes. When the number of population is high, we can apply 
Bonferroni correction to avoid errors due to multiple comparisons. 



Algorithm 2 Significance test 

Step 1: Collect FABS from all subsampling trials for each drug, i.e. 
randomly sample certain percentages of controls and drugs with 
replacement from the original database repeatedly and calculate 
FABS score per drug each time; 

Step 2: Perform Mest on FABS obtained with any two different 
drugs. T-test of drug A and drug B returns a P-value, 

P(drug A, drug B)» 

Step 3: Return -log/? (drug A , drug B ) 



2.3 Data preparation 

We use a subset of a la rge image databas e of Chinese Hamster 
Ovary cells published in IPeng et al 1 (EoT l|). The cells are divided 
into four groups according to the drug treatments they have 
received — control, squamocin, squamocin and z-IETD (shortened 
as z-IETD), and squamocin and z-LEHD (shortened as z-LEHD). 
Squamocin is known to induce mitochondrial fragmentation and 
cell apoptosis (i.e. programmed cell death). z-IETD and z-LEHD 
are inhibitors of caspases that play important roles in mitochondrial 
fragmentation. The goal of the study was to investigate whether 
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Fig. 2. Example cell images show different fragmentation stages of 
mitochondria, tagged with a fluorescent dye. Images at the bottom row 
are cells with the completely fragmented mitochondria, at the top row are 
those without fragmen tation, those in the middle are partially fragmented. 
From <Lin et alluOldb 

z-IETD and z-LEHD can recover mitochondria from squamosin- 
induced fragmentation. Figure[2]shows some example cell images of 
mitochondria at different fragmentation stages. Intact mitochondria 
usually appear like threads, as shown in the images at the top 
row, whereas fragmented mitochondria appear like small globules 
as shown at the bottom row. Even though the totally intact and 
totally fragmented mitochondria (extreme set cases) can be easily 
distinguished by visual inspection, it is very hard (if not impossible) 
to look at a set of mitochondria images that are neither totally 
intact nor totally fragmented (e.g. a set of mitochondria images 
representing a population set of say cells treated by a certain 
drug) and distinguish between these different population sets and 
determine which extreme sets they are closest to and how they 
compare against each other (in terms of level of fragmentation). 
Another challenge is to automate this process. The automation 
process is critical, because the biological data sets available are very 
large and screening them manually could be a very time-consuming 
and laborious task. 

The challenge is to quan tify and rank parti al fragmentation as 
shown in the middle row. jPeng et ali 1201 ll) concluded that z- 
LEHD was more effective than z-IETD in rescuing mitochondria 
from squamocin-induced fragmentation. This conclusion was used 
as the ground truth to assess the prediction accuracy of different 
methods later and images treated by squamocin and control were 
used as extreme cases. 

Our database contains 257 images of cells treated with squamocin, 
239 with z-IETD, 262 with z-LEHD and 238 control. We applied 
a feature extraction method to extract 135 features from each 
cell image to form the feature vector to represent each cell. This 
feature extraction method is the same as the one that was used 
to extract strong detectors from cell im ages to de t ermine protein 
subcellular localization as described by ILin et a/1 d2Q07h . Strong 
detectors include general purpose features derived from image 
transformations, such as Haralick texture features and geometric 
features of the objects extracted from the input image. These features 



have been shown to be useful in problems like recognizing 
fluorescent patterns of subcelluar organ elles in protein subcellular 
localization iHuang and Murphvll2004l) . 



3 RESULTS 

3.1 Formal Analysis of FABS-.4 

Here, we formally define the drug ranking problem and report a bias- 
variance analysis of FABS-^4 as a solution to this problem. The drug 
ranking problem can be considered as a regression problem, where 
given a multi-dimensional observation v; =X e ?d d , we assume that a 
quantity Y G [—1, +1] is associated with X as our target metric of X . 
A solution of this regression problem is to learn a regression model 
from examples that compute Y given X. With the metric quantity 7, 
given two treatments a and b with population distributions V a and 
Vb, respectively, if 



E Va (Y\X)-E Vb (Y\X)^0, 



(1) 



then treatment a will be considered to be more effective than 
treatment b, assuming that Y = +\ is the desired phenotypic 
outcome. 

However, it is usually infeasible to manually assign score Y for a 
sufficient number of training examples consistently. Instead, FABS- 
A simplifies the problem as a bipartition problem. In our bipartition 
scheme, our model will assign Y c — 1 to a given X if Y ^ 0 and 
Y c = — 1 otherwise, and then use empirical population mean as the 
estimated population mean of Y. In a drug screening application, 
this quantity will be used to rank the effectiveness of a treatment. 

More formally, 

7 c = 7 + compl(7) 



where 



compl(7) = 



l-Y 
-l-Y 



if 7^0 
if 7<0 



Instead of directly comparing the expectation of 7, FABS-^4 
compares the expectation of Y c to determine which treatment is 
more effective. 



E v (Y c \X)-E Vb (Y c \X)^0, 



(2) 



Like 7, Y c is unknown and must be estimated with a model learned 
from data. Let Y c be the estimation of Y c . Then 



Y c = 



Y + compl(7) if correctly classified 
7 + 1 if incorrect and 7 < 0 

7 — 1 if incorrect and 7^0 



An analysis of bias and variance of the bipartition scheme is as 
follows. The absolute error made by bipartition instead of regression 
is 

|y_y c | = |Af c | = ( |compl(F)l = 1 - |F| ifcorrect 
I 1 + 1 7 1 otherwise 

Let s be the classification error rate of the bipartition model. 

E(|A7c|) = (l- £ )(l-E(|7|) + £ (l+E(|7|)) 

= l + (2 £ -l)E(|7|)^l(^hen i£) =0£^ 
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The expectation of the absolute error is bounded below one when 
we use a weak classifier for the bipartition that simply guesses a 
label randomly. 

The variance of the absolute error is 

Var(\AY c \) = E(\AY c \ 2 )-(E(\AY c \)) 2 
= 4E(|7|) 2 £(l-£0, 

which turns out to be the variance of Bernoulli trial scaled with the 
square of the expected scale of Y. Again, this is bounded by 1 when 
6=0.5 and E(|y|) = l. 

Next, we consider the expectation of Y c , which is interesting 
because we can infer the expected difference between regression 
(equation [T} and bipartition (equation [2}. 

1 — 7 if Y ^ 0 and correctly classified 

— — —1 — 7 if7<0 and correctly classified 

AY C = Y -Y c = I 

— 1 — 7 if 7^0 and incorrect 

1 — 7 if 7 < 0 and incorrect 

Let P+=Pr(7>0|X), the probability that 7^0 and Y = E(Y\X). 
We have 

E(a£)=(1 - e)P+(l - 7) + (l - -P+)(-l - 7)+ 

eP+(-l-Y)+e(l-P+)(l-Y) 

=(2-Ae)P+-\+2e-Y. 

The result above implies that when we have a weak classifier 
£^0.5, E(A7c)^-7 and E(Y c ) = Y + E(AY c ) = 0. That is, 
regardless of the population, random guessing will not give any 
distinction between any populations and provide no discerning 
power. In contrast, when we have a perfect classifier with e -> 0, 
E(y c )— >2P+ — 1, which is to scale the true probability of 7^ 
0 for the population to [—1,1], perfectly matching our desire. 
Consequently, given an accurate bipartition algorithm, FABS-.4 can 
reasonably approximate effectiveness of drugs without exact scores 
the effectiveness. 

3.2 Performance of ranking 

We compared the performance of FABS-NC' with four other 
baselines that has been used in HCS — center ra nking PC A rankin g 
and graph transition energy method (GTEM) \hm et all l201Qh . 
Center ranking first finds the center, which can be the mean, the 
median or any other measure of the center, of all feature vectors 
associated with a particular drug or an extreme case, then calculate 
the distance, such as Euclidean distance, between all pairs of centers. 
The ranking of the drugs are performed by ordering the drugs 
according to the centers of the closest to the farthest from the center 
of the desired extreme case (such as the completely fragmented 
state for toxicity criterion). PC A ranking is similar to center ranking, 
except it first projects the feature vectors onto the first few important 
prin cipal components, then performs center ranking. GTEM ( Lin 
et a/.. 120101) is also a graph-based approach. GTEM defines graph 
transition energy as the distance metric and utilizes a spectral 
graph theoretic regularization to transform the feature space so that 
extreme cases will be separated widely before ranks populations of 
cells under different treatments. 

In addition to use NC' [solved with Hochbaum's PseudoFlow 
algorithm, HPF, the implementation of which is obtained from 



(IChandran and HochbaumL 120091 : iHochbauml I2008L l2010ah l as 
our bipartition algorithm in the FABS framework, we also tested 
other bipartition procedures. One classical techniqu e is the SVM 
(iBurgesL 1 19981 : ICristianini and Shawe-TavloA l2000h . When using 
SVM for FABS, we satisfy Requirement [T] by setting training data 
as the positive and negative controls: all R\ points are in S and 
all R2 points are in S. To see the performance of this particular 
implementation of (FABS-SVM), the kernel used is radial basis 
function and the parameters are the following: C value is 10 4 and the 
kernel parameter is 1 . T he implementation package used is LIB SVM 
faiangand Lmll201ll) . 

Another approach, often used in image segmentation is based 
on finding the Fielder eigenvector of the graph (referred to as the 
spectral t echnique) as a heurist ic solution for the normalized cut 
problem dShi and Malil l2QQQh . The spectral technique however 
is unsupervised, and thus does not satisfy Requirement [T] To 
resolve this issue, we modified the weights of the graph to ensure 
that Requirement [T] is satisfied. The imple mentation package use d 
is Normalized Cuts Segmentation Code, iTimothee Com] d2004h . 
However, its performance was much worse than all other methods 
and was removed from the results. 

The comparative study that we performed used the median for all 
center measures and Euclidean distance for all distance measures. 
The edge weights between two feature vectors v; and vj increase 
or decrease in the opposite direction with respect to the distance 
between them and is quantified by Wy = e~ W Vi ~ v J W 2 + € , for 0 < e <^ 1 . 

Prior to feeding the input feature vectors extracted from the 
images into FABS-^4, we first pre-process these vectors to transform 
them from a high-dimension space to a space of fewer dimensions. 
In this process, the data are reduced to fewer dimensions, and we 
only preserve the dimensions that are of the most significance to 
our experiment. The dimension reduction is performed by using 
PCA and the number of principal components used is determined by 
adding the largest number of most significant principal components 
that explain up to 80% of the total variation in the dataset considered. 
We also tested whether applying GTEMs feature transformation step 
as a preprocessing step before applying FABS-NC' may improve the 
performance. 

To guarantee statistical validity of our comparison, we 
subsampled the available cell images from the entire database, i.e. 
we drew samples with replacement for certain percentage from 
the database to test methods. The subsampling percentages 30, 60, 
70 and 80% tried for drug images (501 images). For each fixed 
drug percentage, we changed percentages of labeled controls by 
increasing from 10% to 100% to see the effects of the number of 
labeled controls on the final prediction accuracy of the ranking (495 
images in total). The subsampling trials are performed 1000 times for 
each combination. The prediction accuracy of any ranking method is 
the fraction of correctly ranked trials — this can be determined, since 
we have the ground truth — out of the grand total of 1000 trials. 

Figures [3] and |4] graphically summarize the results in the 
experiment. Each graph shown is for 30, 60, 70 or 80% fixed drug 
percentage (testing percentage). The x-axis is the percentage of 
labeled controls used, whereas y-axis displays the average prediction 
accuracy over 1000 trials described in Section IT2l 

Each curve in the graphs indicates a particular ranking method — 
they include FABS-NC', FABS-SVM, Center ranking, PCAranking, 
GTEM. The results of FABS-Spectral is poor with our particular 
implementation and from the figures. The vertical lines in Figures |3] 
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Table 1. Matrices of GDM between different pairs of 
drugs for different implementations of FABS-SVM and 
FABS-NC' 




Percentage of Labeled Positive and Negative Controls Used ( 495 Images in Total ) 

Fig. 3. The accuracy comparison among different ranking methods. The 
vertical bars in the graph are 95% confidence intervals. The testing 
percentages used are: 30 and 60% 




Percentage of Labeled Positive and Negative Controls Used ( 495 Images in Total ) 

Fig. 4. (Continued) the testing percentages used are: 70 and 80% 

and|4]are 95% confidence intervals for the accuracy of each ranking 
method. 

For all testing percentages, the prediction accuracy of FABS- 
NC 7 steadily increases as more labeled controls become available, 
especially when more images are tested (70 and 80%) — the slope 
increases then levels off from the left to the right. The overall 
accuracy is nearly 98% for all graphs at the end of the x-axis, 
indicating that the method is highly accurate with as little as 500 
labeled controls. It is also robust considering that the trend of 
prediction curve remains the same for different testing percentages. 

Moreover, we can see that FABS-NC 7 has an advantage over 
other ranking methods for this particular mitochondria dataset. Its 
curve is often above all other methods, except for 10% labeled 
controls; testing percentage 70%: 70% labeled controls; and testing 
percentage 80%: 10% and interval 40-50%. Notice that for the 
low number of testing (30%), FABS-NC 7 outperforms all other 
methods — when using all labeled controls for ranking, it is over 
half more accurate than the next best algorithm. 

Overall, FABS-SVM also performs well, although sometimes 
trailing behind FABS-NC 7 by a large margin. PCA ranking 
performs poorly when testing images are few (30%). Center 
ranking is generally of low quality, giving small accuracy for 
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all testing percentages. Notice, however, GTEM gives the best 
results when the number of labeled controls is very low (10%), 
indicating its usefulness when training data are few — neverthless, 
its advantage dimishes as more labeled training cases becomes 
available, producing inaccurate rankings comparing to FABS. The 
results show that applying the feature transformation step of GTEM 
as a pre-processing step of FABS-NC 7 performs better than GTEM 
but not as well and as stable as FABS-NC 7 . 

The experimental results suggest that, overall, FABS with 
NC 7 implementation is the best ranking method among all for 
this particular mitochondria database. Remarkably, FABS-NC 7 
generalizes better than any other methods as more training and test 
examples become available. 

3.3 Significance 

Table Q] displays the significance score — logP between different 
pairs of drugs for FABS-NC 7 and FABS-SVM implementations 
when we sub-sampled 30% of the labeled controls and 30% of 
drug treatment results. An infinity score (oo) is obtained when P 
is very close to zero, indicating that the distance between the two 
corresponding drugs is very large. The results show that FABS-NC 7 
is more discriminant then FAB-SVM because the significance scores 
for FABS-NC 7 are larger than those for FABS-SVM. 

We also performed a Monte Carlo simulation to test whether the 
observed difference of the FABS-NC 7 scores of 30% of Z-IETD and 
Z-LEHD data using 80% of control data for training is significant 
against pairs of null data sets sampled from the same drug treatment 
populations. In 1000 random resamplings, no difference of the scores 
of the null data set pairs is higher than the observed score, yielding 
a close to zero P-value. 

3.4 Comparison of running time 

In this section, we compare the running times of three FABS -A 
procedures, where A here, as mentioned in previous sections, i s 
one o f bipartition algorithms including NC 7 dHochbauml 1201 Obh . 
SVM (ICristianini and Shawe-Tavlorl l2000h and Spectral ( Shi and 



Malik J2000l) 7 among themselves and against PCA ranking, Center 
ranking and GTEM. The specification of the computer environment 
for this comparison is a Windows computer with 2.4GHz Intel(R) 
Core(TM)2 Duo CPU 2.40 GHz and 2 GB memory. 

Figures \5\ and |6] display running times of various methods, 
excluding the times for subsampling — which have a median of 0.01 
second, maximum of 0.02 second and minimum of 0.006 second — 
for different testing percentages: x-axis increases with the number 



i112 



Drug profiling by FABS 



§1.E+03 
W1.E+02 
|1.E+01 
|l.E+00 
|>1.E-01 
|l.E-02 
1.E-03 



30 % Testing (501 Images in Total) 



■♦— FABS-NC 
a - FABS-SVM 
* Center Ranking 
— PCA Ranking 

GTEM 
h — FABS-Spectral 




10% 20% 30% 40% 50% 60% 70% 80% 90% 
Percentage of Labeled Positive and Negative Controls Used ( 495 Images in Total ) 



produced by FABS-NC 7 . Moreover, it appears that FABS-NC 7 scales 
much better with increasing input data than FABS-SVM. Looking 
at the other methods besides FABS-^4, we can observe that GTEM 
takes relatively long time on the par with FABS-Spectral — this is in 
contrast with PCA ranking and center ranking whose running times 
are the lowest among all methods: this result is expected, since 
FABS-^4 use PCA for pre-processing (i.e. doing PCA is already 
added as a part of computational costs), therefore FABS -A can 
only take longer time than PCA ranking. However, from Section 
I3.2I it is clear that this extra computational costs bring significant 
improvements in accuracy, which combined with scalability of 
FABS-NC 7 , makes FABS-NC 7 , overall, an attractive candidate for 
ranking this database. 



Fig. 5. The running time comparison among different methods. The testing 
percentages used are: 30 and 60% 
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Fig. 6. (Continued) the testing percentages used are: 70 and 80% 

of positive controls and negative controls used, representing more 
and more training data becoming available, while v-axis is the 
running time. The six curves in the figures are the different methods 
including various implementations of FABS-^4 — notice that FABS- 
NC 7 is represented by the thickest curve. There are 501 testing data: 
265 Z-IETD and 291 Z-LEHD. 

From the figures, among FABS-^4, we can observe that for 
all testing percentages considered, FABS-Spectral takes the most 
running time, lagging behind both FABS-NC 7 and FABS-SVM by 
large margins. For FABS-NC 7 , the running time steadily lengthens 
as the number of positive and negative controls increases, however, 
not as dramatic as FABS-SVM, whose running time, shorter than 
these of other procedures initially, grows exponentially — in one case 
(testing percentage 70%), running 100% of positive and negative 
controls requires around 1000 times more seconds than running 
10% of positive and negative controls. This is to compare with 
FABS-NC 7 : for the same testing percentage, using all positve and 
negative controls only requires twice as much running time than 
that of using only 10% — 10% corresponds to around 50 controls in 
total, a relative small number of images that can be obtained through 
HCS. This observation, combined with the results from Section EOl 
indicates that even though FABS-SVM has the initial advantage 
for running time, this is off- set by the initial more accurate results 



4 DISCUSSION 

In this article, we describe a new drug ranking framework called 
FABS. It is graph based, producing a single scalar score for each 
drug for ranking. The formulation and solution sidesteps many 
pitfalls of other traditional methods. The article also reports FABS- 
NC 7 semi- supervised implementation and its comparative study. 
Not only is this implementation better than four other considered 
methods, it also outperforms FABS-SVM and FABS-spectral 
implementations on a mitochondria databases. This preliminary 
result suggests that FABS-NC 7 is good for ranking toxicity of drugs 
targeting mitochondria for a specific database. 

There are some advantages of our measure. First, FABS is one- 
dimensional, that is, a single scalar, giving an unambiguous way to 
rank drugs. Its computation considers all samples of each drug and 
uses a fraction as the final score. This diminishes the effect of outliers 
and noise, because, if the number of images is large for each drug, 
as in the case of HCS, outliers, which are few in number, can not 
influence the result — a fraction, in a significant way. This similarly 
is the reason for noise reduction. More importantly, our measure 
FABS-NC 7 is acquired through a combinatorial algorithm, which is 
efficient. This is essential since the number of cells observed in a 
HCS is large and the applicability of any metric learning algorithm 
is greatly reduced if it cannot process them sufficiently fast. The 
last noteworthy advantage of our framework is that the training data 
for the semi- supervised formulation are the positive and negative 
controls, which are easily recognizable and obtained without time- 
consuming annotation, sidestepping the limitation of training sample 
size of many metric learning algorithms. 

Our future work includes to investigate whether the introduction 
of node weights, in our construction of the graph in Step 
2 of Algorithm [T] will benefit the prediction results. This is 
especially relevant because of a recent development for solving 
genera lized version of NC 7 utilizing node weights dHochbauml 
l2010d) . Moreover, we could also expand our FABS application into 
other criteria and situations for determining the ranking of the drugs 
and test on more databases to see the effectiveness of our method 
as they become available. 
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